ARMA/ARIMA/SARIMA
Stationarity, Differencing, Model Fitting, Diagnostics, Forcasting, and Validation
ARIMA - CPI
I choose the ARIMA model for forecasting economic indicators because it’s useful for forecasting a series where the data points are independent of the seasonal components, which is the case with the economic indicators such as CPI, GDP, USD index, unemployment rate, and mortgage rate. These indicators typically have underlying trends or cycles that ARIMA can address through differencing, making the data stationary before applying auto-regressive and moving average components to capture the relationships in the data.
Data Processing
Stationarity Check
Initial assessments via ACF and Augmented Dickey-Fuller tests indicated that CPI required differencing due to non-stationarity. After converting to a time series and applying logarithmic transformation, first differencing was insufficient in detrending, but second differencing indicated stationarity.
Augmented Dickey-Fuller Test Results:
Test Statistic: -0.8219849 P-value: 0.959648
The time series is not stationary based on the ADF test.
********************************************************************************************
Model Fitting
After second differencing CPI, ACF shows three lags, while PACF shows four. This suggests ARIMA parameters p = [0,1,2,3], d = [2], q = [0,1,2]. I’ll test these for the lowest AIC, BIC, and AICc, and cross-check with auto.arima to forecasting.
Model fitting with minimum AIC:
3, 2, 2, -8246.31447088773, -8217.381286544, -8246.2222645211
Model fitting with minimum AICc:
3, 2, 2, -8246.31447088773, -8217.381286544, -8246.2222645211
Model fitting with minimum BIC:
0, 2, 2, -8242.21881165827, -8227.75221948641, -8242.19255345258
auto.arima
Series: ts
ARIMA(1,2,1)(2,0,0)[12]
Coefficients:
ar1 ma1 sar1 sar2
0.2957 -0.8775 -0.2088 -0.1736
s.e. 0.0413 0.0217 0.0347 0.0362
sigma^2 = 6.977e-06: log likelihood = 4149.36
AIC=-8288.72 AICc=-8288.66 BIC=-8264.61
Model Diagnostics
The ARIMA(3,2,2) model exhibits a satisfactory fit, evident from the patternless residuals and lack of autocorrelation, but its coefficients are not all statistically significant. In contrast, the ARIMA(0,2,2) model, while equally displaying white noise residuals and minimal autocorrelation, boasts statistically significant coefficients, lending greater weight to its predictive accuracy. The SARIMA(1,2,1)(2,0,0)[12] also presents a strong fit, confirmed by its residuals and significant p-values, and it outperforms the ARIMA(0,2,2) model in terms of lower AIC, BIC, and AICc values. Nonetheless, the simpler ARIMA(0,2,2) is preferred due to its adequate fit and less complexity. Both models are considered robust, with the choice between them hinging on the trade-off between simplicity and statistical thoroughness.
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ar2 ar3 ma1 ma2"
[8] " 1.1965 -0.3280 0.0728 -1.7608 0.7647"
[9] "s.e. 0.0650 0.0541 0.0367 0.0567 0.0545"
[10] ""
[11] "sigma^2 estimated as 7.243e-06: log likelihood = 4129.16, aic = -8246.31"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 913"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 1.1965 0.0650 18.4168 0.0000"
[19] "ar2 -0.3280 0.0541 -6.0621 0.0000"
[20] "ar3 0.0728 0.0367 1.9823 0.0477"
[21] "ma1 -1.7608 0.0567 -31.0300 0.0000"
[22] "ma2 0.7647 0.0545 14.0428 0.0000"
[23] ""
[24] "$AIC"
[25] "[1] -8.982913"
[26] ""
[27] "$AICc"
[28] "[1] -8.982842"
[29] ""
[30] "$BIC"
[31] "[1] -8.951396"
[32] ""
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ma1 ma2"
[8] " -0.5605 -0.2529"
[9] "s.e. 0.0317 0.0330"
[10] ""
[11] "sigma^2 estimated as 7.327e-06: log likelihood = 4124.11, aic = -8242.22"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 916"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ma1 -0.5605 0.0317 -17.7039 0"
[19] "ma2 -0.2529 0.0330 -7.6707 0"
[20] ""
[21] "$AIC"
[22] "[1] -8.978452"
[23] ""
[24] "$AICc"
[25] "[1] -8.978438"
[26] ""
[27] "$BIC"
[28] "[1] -8.962693"
[29] ""
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ma1 sar1 sar2"
[8] " 0.2957 -0.8775 -0.2088 -0.1736"
[9] "s.e. 0.0413 0.0217 0.0347 0.0362"
[10] ""
[11] "sigma^2 estimated as 6.926e-06: log likelihood = 4149.36, aic = -8288.72"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 914"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 0.2957 0.0413 7.1618 0"
[19] "ma1 -0.8775 0.0217 -40.4427 0"
[20] "sar1 -0.2088 0.0347 -6.0251 0"
[21] "sar2 -0.1736 0.0362 -4.7917 0"
[22] ""
[23] "$AIC"
[24] "[1] -9.029109"
[25] ""
[26] "$AICc"
[27] "[1] -9.029062"
[28] ""
[29] "$BIC"
[30] "[1] -9.002845"
[31] ""
Equation for ARIMA(0,2,2):
\[(1 - B)^2 X_t = (1 + \theta_1 B + \theta_2 B^2) W_t\]
Series: ts
ARIMA(0,2,2)
Coefficients:
ma1 ma2
-0.5605 -0.2529
s.e. 0.0317 0.0330
sigma^2 = 7.364e-06: log likelihood = 4124.11
AIC=-8242.22 AICc=-8242.19 BIC=-8227.75
Training set error measures:
ME RMSE MAE MPE MAPE
Training set -2.196212e-05 0.0027077 0.001861392 -0.0006268637 0.04498296
MASE ACF1
Training set 0.0528366 0.0009203494
Forecasting
The graph depicts the predicted logarithm of CPI over time, extending from historical data into future projections. The black line represents the actual historical log(CPI) values, showing a general upward trend over time, which indicates that the CPI has been increasing. The blue shaded area starting around 2020 represents the forecasted values, with the shade indicating the confidence interval of the predictions.
Benchmark Method
The ARIMA model forecasts (red line) are closest to the actual data, indicating a superior fit among the methods. Accuracy metrics support this, with ARIMA showing the lowest error rates across the board, suggesting high precision and minimal bias in forecasting. Other models like the Mean, Naive, and Seasonal Naive exhibit higher errors, indicating less accurate predictions. The Drift model performs better than these but is still outclassed by ARIMA. Overall, ARIMA is identified as the best model for forecasting CPI in this case.
ARIMA Model Accuracy Metrics:
ME RMSE MAE MPE MAPE
Training set -2.196212e-05 0.0027077 0.001861392 -0.0006268637 0.04498296
MASE ACF1
Training set 0.0528366 0.0009203494
Mean Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set -9.418417e-18 0.8627205 0.7913189 -4.129501 19.08201 22.46201
ACF1
Training set 0.9974046
Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 0.002891558 0.004483797 0.003403071 0.06783973 0.08035247 0.096598
ACF1
Training set 0.5749759
Seasonal Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 0.03439707 0.04412258 0.03522921 0.7988339 0.8224095 1
ACF1
Training set 0.9848684
Random Walk with Drift Model Accuracy Metrics:
ME RMSE MAE MPE MAPE
Training set 1.391738e-16 0.003426854 0.002408751 0.0001517385 0.05821684
MASE ACF1
Training set 0.0683737 0.5749759
Model with the best Accuracy Metrics:
ARIMA ARIMA ARIMA Mean ARIMA ARIMA ARIMA
ARIMA - GDP
Data Processing
Stationarity Check
Initial assessments via ACF and Augmented Dickey-Fuller tests indicated that GDP required differencing due to non-stationarity. After converting to a time series and applying logarithmic transformation, first differencing was insufficient in detrending, but second differencing indicated stationarity.
Augmented Dickey-Fuller Test Results:
Test Statistic: -1.049504 P-value: 0.929467
The time series is not stationary based on the ADF test.
********************************************************************************************
Model Fitting
After second differencing GDP, ACF shows one significant lag, while PACF shows five. This suggests ARIMA parameters p = [1,2,3,4,5], d = [1,2], q = [1]. I’ll test these for the lowest AIC, BIC, and AICc, and cross-check with auto.arima to forecasting.
Model fitting with minimum AIC:
1, 1, 1, -1871.92391061098, -1857.04266350455, -1871.79057727765
Model fitting with minimum AICc:
1, 1, 1, -1871.92391061098, -1857.04266350455, -1871.79057727765
Model fitting with minimum BIC:
1, 1, 1, -1871.92391061098, -1857.04266350455, -1871.79057727765
auto.arima
Series: ts
ARIMA(0,2,3)(1,0,1)[4]
Coefficients:
ma1 ma2 ma3 sar1 sma1
-0.8748 -0.0010 -0.1093 -0.6082 0.5539
s.e. 0.0572 0.0756 0.0562 0.3597 0.3727
sigma^2 = 0.0001251: log likelihood = 935.79
AIC=-1859.59 AICc=-1859.31 BIC=-1837.29
Model Diagnostics
The first ARIMA model (1,1,1) shows a good fit with the lowest information criteria scores, indicating effective parameter use. The residuals suggest the model captures the data’s underlying process well. The second model, which appears to be a SARIMA given the seasonal components, is more complex and doesn’t offer a significantly better fit, as the information criteria scores are marginally higher and some coefficients are not statistically significant. The first model is preferable for its simplicity and performance.
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ma1 constant"
[8] " 0.4412 -0.3052 0.0076"
[9] "s.e. 0.2101 0.2191 0.0008"
[10] ""
[11] "sigma^2 estimated as 0.0001232: log likelihood = 939.96, aic = -1871.92"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 302"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 0.4412 0.2101 2.1002 0.0365"
[19] "ma1 -0.3052 0.2191 -1.3930 0.1646"
[20] "constant 0.0076 0.0008 9.6004 0.0000"
[21] ""
[22] "$AIC"
[23] "[1] -6.137455"
[24] ""
[25] "$AICc"
[26] "[1] -6.137194"
[27] ""
[28] "$BIC"
[29] "[1] -6.088664"
[30] ""
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ma1 ma2 ma3 sar1 sma1"
[8] " -0.8748 -0.0010 -0.1093 -0.6082 0.5539"
[9] "s.e. 0.0572 0.0756 0.0562 0.3597 0.3727"
[10] ""
[11] "sigma^2 estimated as 0.0001227: log likelihood = 935.79, aic = -1859.59"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 299"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ma1 -0.8748 0.0572 -15.2823 0.0000"
[19] "ma2 -0.0010 0.0756 -0.0137 0.9891"
[20] "ma3 -0.1093 0.0562 -1.9463 0.0526"
[21] "sar1 -0.6082 0.3597 -1.6908 0.0919"
[22] "sma1 0.5539 0.3727 1.4861 0.1383"
[23] ""
[24] "$AIC"
[25] "[1] -6.117068"
[26] ""
[27] "$AICc"
[28] "[1] -6.116405"
[29] ""
[30] "$BIC"
[31] "[1] -6.043705"
[32] ""
Equation for ARIMA(1,1,1):
\[(1 - \phi B)(1 - B)X_t = (1 + \theta B)W_t\]
Series: ts
ARIMA(1,1,1)
Coefficients:
ar1 ma1
0.9997 -0.9836
s.e. 0.0006 0.0103
sigma^2 = 0.0001276: log likelihood = 934.67
AIC=-1863.34 AICc=-1863.26 BIC=-1852.18
Training set error measures:
ME RMSE MAE MPE MAPE
Training set -0.0001737751 0.01123841 0.007111044 -0.001027974 0.08112324
MASE ACF1
Training set 0.2069521 0.1241003
Forecasting
The forecast shows a projected increase in the log-transformed GDP, with historical data indicating a good model fit. The widening confidence intervals suggest greater uncertainty in the longer term. While useful for economic planning, these predictions rely on past trends continuing unchanged and may not account for unforeseen economic events.
Benchmark Method
The ARIMA model seems to closely follow the actual trend, along with the Drift model. The other models—Mean, Naive, and Seasonal Naive—diverge from the actual trend as time progresses, indicating less accuracy.
From the accuracy metrics given, the ARIMA model outperforms the others with the lowest errors across multiple measures (RMSE, MAE, MPE, MAPE, MASE, and ACF1). The Mean Model performs the worst, with the highest errors. The Naive and Seasonal Naive models also show higher errors than ARIMA but are better than the Mean Model. The Random Walk with Drift Model has metrics comparable to the ARIMA model, suggesting it is also a good fit for the data. Overall, the ARIMA and Drift models are indicated as the best for this dataset based on the provided metrics.
ARIMA Model Accuracy Metrics:
ME RMSE MAE MPE MAPE
Training set -0.0001737751 0.01123841 0.007111044 -0.001027974 0.08112324
MASE ACF1
Training set 0.2069521 0.1241003
Mean Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 1.560046e-17 0.6856741 0.5943983 -0.6000709 6.726537 17.29873
ACF1
Training set 0.9903157
Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 0.007608786 0.01356279 0.01029104 0.08636799 0.1165192 0.2994994
ACF1
Training set 0.1336013
Seasonal Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.03061763 0.04003529 0.03436081 0.3470782 0.388953 1 0.798235
Random Walk with Drift Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 3.814826e-16 0.01122745 0.0071468 0.001228953 0.08148863 0.2079927
ACF1
Training set 0.1336013
Model with the best Accuracy Metrics:
ARIMA Drift ARIMA Mean ARIMA ARIMA ARIMA
SARIMA - 3 Month Treasury Bill
SARIMA models are used for treasury yields like 3-month T-bills or 20-year T-bonds because they help predict the regular ups and downs that happen throughout the year. These ups and downs can be due to when the government borrows more, changes in how often people invest, and rules that banks follow at certain times. SARIMA can catch these patterns, making it easier to guess where yields will go next, which is very important for people who invest in these securities.
Data Processing
Stationarity Check
Initial assessments via ACF and Augmented Dickey-Fuller tests indicated that 3Mon T-Bill Yield required differencing due to non-stationarity. After converting to a time series, first differencing and seasonal differencing alone were insufficient in detrending, but the combination of non-seasonal and seaonal differencing indicated strong stationarity.
Augmented Dickey-Fuller Test Results:
Test Statistic: -3.318366 P-value: 0.07045011
The time series is not stationary based on the ADF test.
********************************************************************************************
Model Fitting
After second differencing 3 Month T-Bill Yield, ACF shows three lags, while PACF shows four. This suggests ARIMA parameters p = [0,1,2,3], P = [2], d = [1], [D = 1], q = [0,1,2], Q = [1]. I’ll test these for the lowest AIC, BIC, and AICc, and cross-check with auto.arima to forecasting.
Minimum AIC: 1, 1, 1, 2, 1, 1, -77.5526287298902, -58.9901275250496, -77.0141671914286
Minimum BIC: 1, 1, 1, 2, 1, 1, -77.5526287298902, -58.9901275250496, -77.0141671914286
Minimum AICc: 1, 1, 1, 2, 1, 1, -77.5526287298902, -58.9901275250496, -77.0141671914286
auto.arima
Series: ts
ARIMA(3,1,0)(2,0,0)[4]
Coefficients:
ar1 ar2 ar3 sar1 sar2
-0.0812 -0.0820 -0.2835 -0.1593 -0.0340
s.e. 0.0814 0.0789 0.0780 0.0905 0.0897
sigma^2 = 0.03345: log likelihood = 49.13
AIC=-86.25 AICc=-85.73 BIC=-67.54
Model Diagnostics
In the model fitting analysis, two prominent models emerged: SARIMA(3,1,0)(2,0,0)[4] and SARIMA(1,1,1)(2,1,1)[4]. The diagnostic comparison for these models focuses on evaluating key summary statistics, including the presence of a white noise pattern in the residuals, p-values, and critical model selection criteria like AIC, AICc, and BIC.
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ma1 sar1 sar2 sma1"
[8] " 0.8182 -1.0000 -0.1960 -0.1040 -0.8524"
[9] "s.e. 0.0520 0.0256 0.0993 0.0982 0.0615"
[10] ""
[11] "sigma^2 estimated as 0.03124: log likelihood = 44.78, aic = -77.55"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 158"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 0.8182 0.0520 15.7209 0.0000"
[19] "ma1 -1.0000 0.0256 -39.1200 0.0000"
[20] "sar1 -0.1960 0.0993 -1.9733 0.0502"
[21] "sar2 -0.1040 0.0982 -1.0593 0.2911"
[22] "sma1 -0.8524 0.0615 -13.8632 0.0000"
[23] ""
[24] "$AIC"
[25] "[1] -0.475783"
[26] ""
[27] "$AICc"
[28] "[1] -0.4734384"
[29] ""
[30] "$BIC"
[31] "[1] -0.3619026"
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ar2 ar3 sar1 sar2 constant"
[8] " -0.0871 -0.0869 -0.2877 -0.1652 -0.0364 0.0067"
[9] "s.e. 0.0816 0.0790 0.0780 0.0908 0.0896 0.0080"
[10] ""
[11] "sigma^2 estimated as 0.03231: log likelihood = 49.47, aic = -84.94"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 161"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 -0.0871 0.0816 -1.0681 0.2871"
[19] "ar2 -0.0869 0.0790 -1.0994 0.2733"
[20] "ar3 -0.2877 0.0780 -3.6873 0.0003"
[21] "sar1 -0.1652 0.0908 -1.8203 0.0706"
[22] "sar2 -0.0364 0.0896 -0.4065 0.6849"
[23] "constant 0.0067 0.0080 0.8386 0.4029"
[24] ""
[25] "$AIC"
[26] "[1] -0.5086453"
[27] ""
[28] "$AICc"
[29] "[1] -0.5055015"
[30] ""
[31] "$BIC"
[32] "[1] -0.3779509"
Equation for SARIMA(1,1,1)(2,1,1)[4]:
\[(1 - \phi B)(1 - B)(1 - \Phi_1 B^s)(1 - \Phi_2 B^{2s})(1 - B^s)X_t = (1 + \theta B)(1 + \Theta_1 B^s)W_t\]
Series: ts
ARIMA(1,1,1)(2,1,1)[4]
Coefficients:
ar1 ma1 sar1 sar2 sma1
0.8182 -1.0000 -0.1960 -0.1040 -0.8524
s.e. 0.0520 0.0256 0.0993 0.0982 0.0615
sigma^2 = 0.03223: log likelihood = 44.78
AIC=-77.55 AICc=-77.01 BIC=-58.99
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.01547569 0.1740971 0.1211176 37.62815 140.1717 0.5557939
ACF1
Training set -0.009057403
Forecasting
The forecasting plot displays the historical and forecasted values for the 3 Month Treasury Yield over time, segmented into quarterly intervals. The historical data is traced with a solid black line that spans from 1980 up until around 2020. The forecasted data begins near 2020, transitioning into a dotted line. Accompanying the forecasted values is a blue shaded area that represents the confidence intervals, suggesting a range of possible values within which the true yield might fall. This area increases in size as the forecast extends further into the future, reflecting growing uncertainty in the model’s predictions over time.
Benchmark Method
The benchmark plot compares several forecasting models, showing the SARIMA model outperforms others with the lowest error metrics, suggesting it’s the most accurate for predicting the given time series from 1980 to the near present.
SARIMA Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set -0.01547569 0.1740971 0.1211176 37.62815 140.1717 0.5557939
ACF1
Training set -0.009057403
Mean Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set -1.183749e-17 0.3317336 0.2505159 3.214205 311.7334 1.149587
ACF1
Training set 0.8236012
Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 0.006370865 0.188809 0.1231996 17.7233 112.8604 0.565348
ACF1
Training set -0.02926909
Seasonal Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.0353719 0.3282686 0.2179181 59.57998 334.7433 1 0.6614068
Random Walk with Drift Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 2.94903e-17 0.1887014 0.1234148 20.7863 112.9409 0.5663355
ACF1
Training set -0.02926909
Model with the best Accuracy Metrics:
SARIMA SARIMA SARIMA Mean Naive SARIMA Drift
Cross Validation
The graph shows a multi-step forecast horizon where two models are compared based on their Mean Absolute Error. Across all forecast horizons, Model 2 consistently reports a lower MAE, suggesting it has a more accurate forecast performance than Model 1. This is reinforced by one-step ahead cross-validation results where Model 2 also has a lower MAE and MSE, confirming its superior predictive accuracy.
One-Step Ahead Cross Validation:
Model 1 - MAE: 0.09014297 MSE: 0.01947199
Model 2 - MAE: 0.0967174 MSE: 0.02260441
Model 1 performs better on both MAE and MSE.
Multi Step Cross Validation:
SARIMA - 6 Month Treasury Bill
Data Processing
Stationarity Check
The provided plots illustrate the stages of time series analysis, where the initial ACF plot suggests non-stationarity, confirmed by a significant Augmented Dickey-Fuller test result, indicating that differencing has led to a stationary series. The decomposition shows the time series’ trend and seasonality, while subsequent ACF and PACF plots demonstrate effective stabilization of the series after differencing, readying it for ARIMA modeling.
Augmented Dickey-Fuller Test Results:
Test Statistic: -3.805526 P-value: 0.02041494
The time series is stationary based on the ADF test.
********************************************************************************************
Model Fitting
After differencing 6 Month T-Bill Yield, ACF and PACF show significance at multiple lags. This suggests ARIMA parameters p = [0,1,2,3], P = [2], d = [1], [D = 1], q = [0,1,2,3], Q = [1]. I’ll test these for the lowest AIC, BIC, and AICc, and cross-check with auto.arima to forecasting.
Minimum AIC: 2, 1, 2, 2, 1, 1, -0.473054691655761, 24.2769469147983, 0.462010243409175
Minimum BIC: 1, 1, 1, 2, 1, 1, 5.03140449969021, 23.5939057045308, 5.56986603815175
Minimum AICc: 2, 1, 2, 2, 1, 1, -0.473054691655761, 24.2769469147983, 0.462010243409175
auto.arima
Series: ts
ARIMA(0,1,0)(2,0,0)[4]
Coefficients:
sar1 sar2
-0.2138 -0.2045
s.e. 0.0909 0.0933
sigma^2 = 0.05475: log likelihood = 6.38
AIC=-6.75 AICc=-6.61 BIC=2.6
Model Diagnostics
In the model fitting analysis, three prominent models emerged: SARIMA(0,1,0)(2,0,0)[4], SARIMA(2,1,2)(2,1,1)[4] and SARIMA(1,1,1)(2,1,1)[4]. The diagnostic comparison for these models focuses on evaluating key summary statistics, including the presence of a white noise pattern in the residuals, p-values, and critical model selection criteria like AIC, AICc, and BIC.
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " sar1 sar2 constant"
[8] " -0.2142 -0.2046 0.0022"
[9] "s.e. 0.0910 0.0933 0.0128"
[10] ""
[11] "sigma^2 estimated as 0.05408: log likelihood = 6.39, aic = -4.78"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 164"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "sar1 -0.2142 0.0910 -2.3553 0.0197"
[19] "sar2 -0.2046 0.0933 -2.1921 0.0298"
[20] "constant 0.0022 0.0128 0.1753 0.8611"
[21] ""
[22] "$AIC"
[23] "[1] -0.02864336"
[24] ""
[25] "$AICc"
[26] "[1] -0.02776168"
[27] ""
[28] "$BIC"
[29] "[1] 0.04603913"
[30] ""
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ar2 ma1 ma2 sar1 sar2 sma1"
[8] " -0.1820 0.8030 -0.1146 -0.8854 -0.2283 -0.2178 -0.9541"
[9] "s.e. 0.0611 0.0563 0.0685 0.0667 0.0991 0.0979 0.0638"
[10] ""
[11] "sigma^2 estimated as 0.04746: log likelihood = 8.24, aic = -0.47"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 156"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 -0.1820 0.0611 -2.9789 0.0034"
[19] "ar2 0.8030 0.0563 14.2623 0.0000"
[20] "ma1 -0.1146 0.0685 -1.6739 0.0961"
[21] "ma2 -0.8854 0.0667 -13.2651 0.0000"
[22] "sar1 -0.2283 0.0991 -2.3029 0.0226"
[23] "sar2 -0.2178 0.0979 -2.2244 0.0276"
[24] "sma1 -0.9541 0.0638 -14.9487 0.0000"
[25] ""
[26] "$AIC"
[27] "[1] -0.002902176"
[28] ""
[29] "$AICc"
[30] "[1] 0.001530834"
[31] ""
[32] "$BIC"
[33] "[1] 0.1489383"
[34] ""
[1] "Call:"
[2] "arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), "
[3] " include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, "
[4] " REPORT = 1, reltol = tol))"
[5] ""
[6] "Coefficients:"
[7] " ar1 ma1 sar1 sar2 sma1"
[8] " -0.9902 0.8509 -0.2971 -0.2563 -0.9561"
[9] "s.e. 0.0226 0.0712 0.0916 0.0946 0.0585"
[10] ""
[11] "sigma^2 estimated as 0.05234: log likelihood = 3.48, aic = 5.03"
[12] ""
[13] "$degrees_of_freedom"
[14] "[1] 158"
[15] ""
[16] "$ttable"
[17] " Estimate SE t.value p.value"
[18] "ar1 -0.9902 0.0226 -43.8806 0.0000"
[19] "ma1 0.8509 0.0712 11.9567 0.0000"
[20] "sar1 -0.2971 0.0916 -3.2442 0.0014"
[21] "sar2 -0.2563 0.0946 -2.7084 0.0075"
[22] "sma1 -0.9561 0.0585 -16.3309 0.0000"
[23] ""
[24] "$AIC"
[25] "[1] 0.03086751"
[26] ""
[27] "$AICc"
[28] "[1] 0.03321209"
[29] ""
[30] "$BIC"
[31] "[1] 0.1447479"
[32] ""
Equation for SARIMA(2,1,2)(2,1,1)[4]:
\[(1 - \phi_1 B - \phi_2 B^2)(1 - B)(1 - \Phi_1 B^s - \Phi_2 B^{2s})(1 - B^s)X_t = (1 + \theta_1 B + \theta_2 B^2)(1 + \Theta_1 B^s)W_t\]
Series: ts
ARIMA(2,1,2)(2,1,1)[4]
Coefficients:
ar1 ar2 ma1 ma2 sar1 sar2 sma1
-0.1820 0.8030 -0.1146 -0.8854 -0.2283 -0.2178 -0.9541
s.e. 0.0611 0.0563 0.0685 0.0667 0.0991 0.0979 0.0638
sigma^2 = 0.04959: log likelihood = 8.24
AIC=-0.47 AICc=0.46 BIC=24.28
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.004070017 0.2145945 0.1528131 104.5816 243.6773 0.5254351
ACF1
Training set 0.0078855
Forecasting
The forecasting plot illustrates the historical and predicted quarterly data for the 6 Month Treasury Yield, spanning from 1980 and extending beyond 2020. The historical data, depicted by a solid black line, shows the yield’s volatility over time. The forecast, beginning near 2020, is shown with a dotted line and a blue shaded area indicating the prediction intervals, suggesting increased uncertainty in the forecast as time progresses.
Benchmark Method
The plot compares the performance of several forecasting models for an economic time series, with the SARIMA model showing the lowest RMSE and MAE, indicating it has a strong predictive accuracy. The Drift model also performs well in certain metrics like MPE and MAPE. In contrast, the Seasonal Naive model appears to be the least accurate, with the highest error values.
SARIMA Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 0.004070017 0.2145945 0.1528131 104.5816 243.6773 0.5254351
ACF1
Training set 0.0078855
Mean Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 8.108929e-19 0.3310926 0.2327435 161.2462 176.9307 0.8002689
ACF1
Training set 0.7392247
Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set 0.002112496 0.2387966 0.158006 47.22492 225.3856 0.5432904
ACF1
Training set -0.1433554
Seasonal Naive Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.01693984 0.4100165 0.2908316 240.2875 374.3727 1 0.6545677
Random Walk with Drift Model Accuracy Metrics:
ME RMSE MAE MPE MAPE MASE
Training set -3.941913e-17 0.2387872 0.1581026 42.26205 229.5025 0.5436225
ACF1
Training set -0.1433554
Model with the best Accuracy Metrics:
Drift SARIMA SARIMA Drift Mean SARIMA Drift
Cross Validation
The graph displays the MAE and MSE for two models across different forecast horizons. Model 1, represented by the red line, consistently shows a lower MAE than Model 2, the green line, indicating more accurate forecasts at all points. This is supported by one-step ahead cross-validation results where Model 1 has lower MAE and MSE values than Model 2, suggesting that Model 1 has a better predictive performance.
One-Step Ahead Cross Validation:
Model 1 - MAE: 0.132274 MSE: 0.03496468
Model 2 - MAE: 0.1428235 MSE: 0.03843426
Model 1 performs better on both MAE and MSE.
Multi Step Cross Validation: